home *** CD-ROM | disk | FTP | other *** search
- It is preliminary version of text to put in x86 FAQ (it was
- also sent to alt.lang.asm, seems this group is more appropriate).
- Please send me suggestions what improvement would be useful.
-
-
- Real and Protected Modes.
-
- Beginning from 80286, Intel CPUs have ability to work in Protected Mode
- (older CPUs have Real Mode only). For compatibility reasons, all CPUs
- start in Real Mode after reset. Below are presented main differences
- between Real Mode and Protected modes for Intel CPUs. Note there are:
- Real Mode, Protected Mode, Virtual 8086 Mode (they will be frequently
- called RM, PM, VM86, respectively; also 286+(386+) will mean Intel
- 80286(80386) or better).
-
- There are some differences between these modes in memory addressing
- (PM can address all memory, while RM can't unless it is set in PM on
- 386+, and VM86 cannot unless using PM supporting it to remap memory
- - this way EMM386 works); instruction set (some instruction are not
- allowed in RM), privileges (something can be forbidden in PM for less
- privileged code, many operations are forbidden in VM86), interrupt
- handling. PM supports multitasking, PM can run tasks in VM86 (the
- VM86 cannot function alone, must have PM code supporting it; it works
- similarly 8086 CPU with few enhancements except interrupt servicing
- which goes through PM). PM cannot store data to code segment (unless
- by aliasing; MOV CS:[BX],AX is illegal in PM). VM86 and PM on 386+ can
- have selective I/O port access restrictions (some ports can be accessed
- without causing exception and other can't).
-
-
- Memory addressing and Paging.
-
- In any mode, opcode defines some offset and segment of referenced memory
- address, e.g. mov ax,es:[bx+si+1] - segment es, offset bx+si+1, push si
- - segment ss, offset sp-2, opcode itself is referenced by segment cs and
- offset ip; the address is translated to Linear Address by adding the
- offset to base of the segment and the Linear Address is then translated
- to Physical Address which is outputed by CPU on its address pins.
-
- In RM or VM86, the base is segment*10h; in PM the base is taken from
- descriptor table (LDT or GDT) and can have any value.
- The value in segment register is called "selector" and its bits 15-3
- specify offset in LDT or GDT (the offset is multiply of 8), bit 2 is 0
- for GDT, 1 for LDT, bits 1-0 specify RPL (Requested Privilege Level).
-
- Unless Paging (possible in PM and VM86, on 386+ only) is enabled,
- Physical Address = Linear. With Paging, low 12 bits of Linear Address
- go to Physical, other are used as index to two-level page tables
- (first bits 31-22 select page directory, then bits 21-12 select page).
-
- Paging can also restrict access to some pages (in a way non-privileged
- code can read it only or has no access at all), or define non-present
- pages which have assigned physical addresses and put in memory in a way
- transparent to program when access to their Linear Address is attempted.
- Note Linear Address space is 4GB on 386+, and probably no system has so
- much physical memory: Paging makes system able to simulate it has.
-
- Segment has also limit. Initially, the limit is 0FFFFh for all segment
- registers and cannot be changed in RM or VM86. In PM it is loaded from
- LDT or GDT when segment register is loaded. On 286 in PM the limit can
- be up to 0FFFFh, on 386+ in PM it can be up to 0FFFFFFFFh.
- Also, PM allows "expand down" segments which allow access from address
- limit+1 to maximum possible value of limit (depend on segment type).
-
-
- Privilege Levels and Rules.
-
- In RM, CPU has full privileges. In PM and VM86, they can be restricted.
- This reduces possibility of making disasters by bad code.
-
- Base rules: cannot access more privileged data or call less privileged
- code than own privilege (although can return to less privileged code).
- Additional: call to more privileged code cannot use any target address
- caller wants, it can use addresses specified by system only; call to
- more privileged code must change stack to make sure enough stack space
- is available for called code (so caller cannot cause crash in it).
-
- There are 4 levels: level 0 is full privilege (except Debug Registers,
- which can be protected from access even from level 0; some instructions
- are reserved for level 0 only), the bigger level the less privileges
- are. Few terms used for Privilege Levels: CPL - Current PL, DPL -
- Descriptor PL, RPL - Requested PL (in selector), IOPL (in flags) -
- max CPL allowing I/O sensitive opcodes (CLI, STI, PUSHF, POPF,...).
-
- Unless accessing Conforming Code segment, privilege rules require
- max(CPL,RPL)<=DPL. To execute code (by FAR CALL or JMP) need DPL<=CPL
- (note unless it is Conforming, must be DPL=CPL and RPL<=CPL) - cannot
- call less privileged procedure, for example. To transfer control to
- code with less PL (more privileged), must CALL via call gate (in such
- a case, need max(CPL,RPL)<=gate_DPL, but for code the gate refers to
- may be code_DPL<gate_DPL; the gate is entry in GDT or LDT; privilege
- rules require also target_code_DPL <= CPL for CALL, = for JMP), this
- also requires TR to point to valid TSS because it switches stack: old
- SS:[E]SP are pushed on new stack, then parameters (as defined in call
- gate) are pushed, finally CS:[E]IP are pushed. On return from the call
- CPU detects RPL of CS on stack > CPL and switches stack back (if =, no
- stack switch, < inhibited by privilege rules), for proper functioning
- parameter counts on RET and in call gate must match. For stack segment
- DPL must be equal CPL (so in more privileged mode no crash is possible
- due to incorrect stack setting in less privileged, and in the less
- privileged there is no access to more privileged mode stack).
-
- The RPL is for system to block possibility to pass a pointer from user
- code which is invalid in user mode and valid in system: system uses RPL
- as for user code and gets access violation error in such a case.
- It can be done using ARPL opcode which adjusts RPL for a selector, and
- sets ZF if changed (to inform OS invalid access might be attempted).
- OS uses it to set RPL of the pointer to CPL of the application code.
-
- It is possible to check what access having to a segment by opcodes like
- VERR, VERW, LAR, LSL. They all set ZF if having access, clear if not.
- First two simply verify R/W access, LAR gets bits defining access right
- for a segment, LSL gives the segment limit value. These opcodes allow
- checking what would cause access violation, instead getting the error.
-
- Conforming code segments can be accessed without high privilege, they
- are for libraries which are shared between levels (otherwise would need
- keep separate copy for every level). Data kept in them can be accessed
- from any PL (providing they are readable) and code can be accessed (by
- jump or call) from same or less privileged PL - in such a case CPL is
- NOT changed by the jump or call. Cannot execute conforming code from
- more privileged PL: it is not trusteed enough to get CPL<DPL (greater
- privilege than defined in system tables).
- I'm not sure how return from non-conforming to conforming code works,
- seems RPL taken from CS on stack determines new CPL (which may be less
- privileged than the conforming code segment DPL).
-
- Some instructions are allowed at CPL=0 only. They are:
- Clear Task─Switched Flag (CLTS), Halt Processor (HLT), loading some
- system registers (GDTR,IDTR,LDTR,MSW,TR), any access to CRx,DRx,TRx.
- Some other require CPL<=IOPL. They are: IN, INS, OUT, OUTS, CLI, STI.
- Also, POPF behavior depends on CPL: if CPL>0, IOPL and VM aren't
- changed by POPF, if CPL>IOPL, IF (interrupt enable) isn't changed.
-
-
- Interrupts.
-
- In every mode, there is an array containing information what action is
- to be taken in case of interrupt. Its first entry corresponds to INT 0,
- next to INT 1, and so on. It is called IDT(Interrupt Descriptor Table).
- In RM, each entry in the IDT is simply far address of interrupt service
- routine. Initially IDT is located at address 0 and has 100h entries
- (400h bytes; some CPU-s have its limit 0FFFFh but the remainder isn't
- accessible in RM); on pre-80286 CPUs the IDT address and size cannot be
- changed, on 286+ can load and store them using LIDT and SIDT opcodes.
-
- In PM the IDT has 8-byte entries which can be interrupt, trap or task
- gates. Trap differs from interrupt by leaving interrupt flag same as
- in interrupted code. Task gate causes calling another task. They all
- have DPLs and interrupt instruction causes General Protection error
- if CPL > interrupt or trap gate DPL. However, other interrupt sources
- have "CPL 0" - they can access any gate needed.
-
- Some conditions can cause an Exception. They are (for 80386): divide
- error (0), debug exceptions (1), non-maskable interrupt (2), breakpoint
- (3), overflow (4, on into opcode), bounds check (5, on bound opcode),
- invalid opcode (6), coprocessor not available (7), double fault (8,E),
- coprocessor segment overrun (9,P), invalid TSS (10,PE), segment not
- present (11,PE), stack error (12,E), general protection error (13,E),
- page fault (14,PE), coprocessor error (16); marked by P can occur in
- PM and VM86 only, marked by E push error code on stack if they occur
- in PM or VM86 (so stack is: error, IP, CS, flags; the error code is
- usually either 0 or selector causing the exception (in case selector is
- invalid or non-accessible), with flags on low order bits: bit 0 means
- external source, bit 1 IDT selector, bit 2 LDT; for page fault it is
- set of flags (bits 3-31 undefined): bit 0 set if page protection
- violation, 1 if writing, 2 if user mode), most of them push IP of
- opcode causing them, except 3,4,9 which push IP of next opcode.
- Note: interrupt cannot be serviced at PL>CPL (unless via task switch),
- attempt to do it causes General Protection error.
-
- Interrupt processing in PM is more complicated when interrupt handler
- has Privilege Level other than current code. It is handled similarly
- CALL via gate: stack is switched, new SS:SP are taken from TSS, old
- SS:SP are pushed on the new stack, then flags, CS, IP and eventually
- error code (for some exceptions) are pushed.
- In VM86 interrupt pushes GS,FS,DS,ES,SS,ESP,EFLAGS,CS,EIP (exception
- also error code) onto PL 0 stack. There is VM bit in EFLAGS set to tell
- interrupt occured in VM86. Note IDT must contain task gates and 80386
- trap or interrupt gates pointing to a non-conforming code segment with
- DPL=0 only - interrupt service must come through PL 0 or task switch.
- The VM86 itself has CPL 3 and is allowed in 386 task only.
-
-
- Descriptor Tables (PM only).
-
- Global Descriptor Table(GDT) can contain descriptors of any type except
- interrupt and trap gates. It is necessary for PM. First entry in GDT
- isn't used - it corresponds to null selector which can be loaded into
- segment register but causes exception if used for memory addressing.
-
- Local Descriptor Table(LDT) can contain "normal" segment descriptors
- (not e.g. TSS) and call or task gates only. Usually every task has its
- own LDT (changed on task switch). The LDT must have descriptor in GDT.
-
- Interrupt Descriptor Table(IDT) was discussed in "Interrupts" section.
-
- "Normal" segment descriptors are referenced when a segment register is
- loaded and they describe a memory area and give some access to it.
- Bit 2 of selector used selects table: 0 means GDT, 1 means LDT.
- Other descriptors can be Task State Segment(TSS), and gates. They can
- be referenced "as a code segment", e.g. by far jump or call and they
- cause transferring control to task or code segment referenced by them.
- It is kind of indirect jump or call (they contain target selector).
- TSS or gate pointing to TSS cause task switch. Gate can be used to
- transfer control to more privileged code not accessible directly.
- TSS can be also referenced by LTR (Load Task Register) opcode and it
- is done once during PM initialization. LDT descriptor can be loaded
- into LDTR(register) by LLDT opcode and usually it is done once.
-
-
- Segment and System Descriptors.
-
- The following segment types (in byte [descriptor+5]) are supported
- (for all bit 7 means present in memory, bits 5-6 keep DPL which says
- what is maximum CPL which can access the descriptor, the restriction is
- for all descriptors, not segments only, except conforming segments):
-
- 10h+flags - data: bit 1 - writable, bit 2 - expand down
- 18h+flags - code: bit 1 - readable, bit 2 - conforming
-
- for both, bit 0 is set by any access. The descriptor also contains
- limit in word [0] (in 386 segments extended to bits 0-3 of byte [6])
- and base in bytes [2..4] (in 386 segments extended to byte [7]).
- Byte [6] keeps few additional flags: bit 7 - granularity (limit is in
- 4kB pages; e.g. limit 0 means 0..0FFFh accessible), bit 6 - 32-bit
- addressing (applies to code and stack - use EIP, ESP, makes expand down
- segment upper limit 4GB), bit 5 must be 0, bit 4 is for programmer.
-
- 01h+flags - TSS: bit 1 - busy, bit 3 - 386 TSS
- 02h - LDT
- 04h+flags - call gate
- 05h - task gate
- 06h+flags - interrupt gate: bit 0 - trap, bit 3 - 386.
-
- for all gates, word[2] keeps selector, word[0] and word[3] keep offset
- of called code (ignored for task gate), byte[4] keeps word count (0-31)
- for copying in case of inter-level call (call gate only, else ignored);
- TSS and LDT have base and limit in same form as code and data segments
- have, they can have bit 7 set in byte [6] to specify limit in pages.
- Word [6] should be 0 for the descriptor to mean the same on 286/386.
-
- LDT is similar GDT, except not all descriptor types are allowed.
- TSS holds entire task state (all registers: general, segment, flags,
- ip, ldtr); it also keeps link to caller TSS (valid if the task was
- activated by INT or CALL) and stacks (SS and [E]SP) for PL 0,1,2
- (they are used when more privileged code is invoked via gate from less
- privileged). 386 TSS has also debug trap bit (if set, causes INT 1 on
- task switch to the TSS), I/O bit map (saying which I/O addresses can
- be accessed when CPL>IOPL without General Protection exception), and
- CR3 value for the task (can remap memory on task switch).
-
-
- Page tables:
-
- both page directory and page table entries keep referenced address in
- bits 31-12, have bits 11-9 reserved for programmer, must have bits 8,7,
- 4,3 set to 0; bit 5 is called A (accessed), it is set by CPU on access
- to the entry, bit 6 is called D (dirty), it is set if referenced memory
- is written; bit 0 is called P (present), all other are ignored if it is
- not set; bit 2 allows user (CPL=3) access if set, bit 1 allows user to
- write (together with bit 2 only), for CPL<3 read/write is allowed for
- any setting of bits 1 and 2 (no protection against system this way).
- Note page table entries used are usually cached by CPU: modifying them
- in memory may cause no mapping change until the cache is reloaded. The
- cache is flushed every time CR3 (which points to first page directory
- entry) is loaded. Bits 0-11 of CR3 must be 0 (directory page-aligned).
- Addressing through page tables: CR3+(Linear_Address SHR 20) AND 0FFCh
- is address in Page Directory, the entry at the address contains Page
- Table address; Page Table address + (Linear_Address SHR 10) AND 0FFCh
- is address in Page Table and the entry at the address contains base
- address of the page, combine it with bits 11-0 of Linear_Address and
- the result is Physical Address. In case of any error, CR2 is set to the
- Linear Address causing the error and error code explains what error.
- Note: if Paging is enabled, CR3 must keep Physical Address of Page
- Directory and all other addresses are Linear Addresses.
-
-
- Switching to Protected Mode or back to Real Mode:
-
- First: to get control in case of crash, need store in dword [0467h]
- address where control is to be passed, and put 0Ah in CMOS register 0Fh
- (by CLI; MOV AL,8Fh; OUT 70h,AL; (1us delay) MOV AL,0Ah; OUT 71h,AL;).
- Also: normally, some circuitry in PC compatibles disables address line
- A20; must enable it. If you use HIMEM, it can be enabled by a request
- to HIMEM. If you also have DOS=HIGH, it is usually enabled, as it is
- enabled by any DOS call. In other cases, you must send output port
- value to keyboard controller to enable it before switching to PM.
-
- Switch to PM: required is loading GDTR, then can enable protection by
- setting CR0/MSW bit 0 (MOV EAX,CR0; OR AL,1; MOV CR0,EAX; or SMSW AX;
- OR AL,1; LMSW AX; first on 386+, second on 286+); it is recommended
- to load IDTR immediately before or after mode switch (same IDT can't be
- valid in both modes); immediately after mode change should execute JMP
- to flush prefetch queue which may be partially decoded (the decoding
- may be mode dependent); need load CS and SS - they contain invalid
- selectors and e.g. interrupt causes them to be put on stack and crash
- on IRET; it is also recommended to load all segment registers (they can
- be loaded with 0 to contain invalid selector and cause exception if any
- of them is used to address memory) and LDTR; before first task switch
- must load TR (selector of valid free TSS descriptor; the TSS will be
- used to store state on task switch).
-
- These is also a BIOS call which switches to PM and changes external
- interrupt vector mapping (normally 1st controller has 08h..0Fh, 2nd
- 70h..77h, the 1st conflicts with some CPU exceptions; however it is
- easy to distinguish external interrupt from an exception), it also
- enables address line A20. See INT 15h, AH=89h description.
-
- Returning to RM: it can be done by clearing bit 0 in CR0 but it needs
- some preparation: must disable paging (go to code/stack which has
- linear addresses same as physical, clear PG bit in CR0, clear CR3), go
- to code segment with limit=64k and load all segment registers except
- CS with valid descriptor of 64kB read/write expand-up byte-granular
- present segment (attribute byte=93h, extended attribute=0) - otherwise
- you can get RM with e.g. read-only or 32kB ES, which will soon cause
- crash. After clearing the bit 0 of CR0 execute far jump to load CS and
- flush prefetch queue and load segment registers for RM.
-
- This is not available on 80286 which has no CR0 register (the Protect
- Enable bit cannot be cleared by LMSW). The only way to get to RM again
- is resetting the CPU: it can be done by the following code: CLI;
- XOR CX,CX; wait_kbd_ctrlr_input_empty: IN AL,64h; TEST AL,2; LOOPNZ
- wait_kbd_ctrlr_input_empty; MOV AL,0FEh; OUT 64h,AL; HLT; or by CPU
- shutdown (resulting in case of exception while servicing double fault).
-
- Note most programs running system in VM86 provide interface to switch
- to PM and back to VM86, it is called VCPI (Virtual Control Program
- Interface), can be tested for presence and invoked by INT 67h,AH=0DEh.
- It requires 3 entries in GDT to be reserved for VCPI provider.